Our proposed digital exploration will analyze the linkage between economic factors and mental health trends within the wider Alberta population. The Mental Health Commission of Canada conducted an economic review in 2011 and provided an estimate of the impact of mental health illnesses on lost productivity due to absenteeism, presenteeism (present but less than fully productive at work) and turnover; in 2011 alone, the cost to the economy was 6.3 billion. This value is projected to rise to 16 billion in 2041. In any given year, 1 in 5 Canadians experiences a mental illness or addiction problem and by the time Canadians reach 40 years of age, 1 in 2 have, or have had, a mental illness. This means that more than 6.7 million people in Canada are living with a mental health problem or illness today. That is 19.8% of Canada’s population in any given year. It is likely that because there is stigma attached to harbouring a mental health diagnosis, that reported metrics are understated. According to the World Health Organization (WHO) the incidence of mental illness is expected to rise as economic drivers become increasingly dynamic, and the “gig economy” becomes more commonplace. The collective provincial population would benefit vastly from applied data analytics in order to address the compounding mental health crisis that is ongoing within the province. It is possible that in the future, applied analytics will be able to guide policy makers to utilize provincially budgeted resources in a more targeted and efficient manner.
On an aggregate level, it is well documented that the relationship between economic inequality and mental health exists, but despite this, a reductionist biomedical model assessing mental health on an individual and physiological basis has persisted within the academic medical community. This has limited the ability of corporate entities and policy makers to address inequalities within the mental health sphere. Another driving factor of mental health inequality has been economic volatility. The province of Alberta has experienced significant economic hardship following the collapse of Western Canadian Select (WCS) oil prices and NOVA/AECO-C gas prices in 2014/2015. Since that time, the energy market never fully recovered. The new commodity price environment also spurred a wider thematic global investment shift away from the energy industry and mounted pressure on corporate entities to support ESG driven narratives.
We knew going into the analysis that medically rooted data is safeguarded by medical ethics boards, and it would therefore more challenging to locate granular medical datasets. We hypothesized that given that economic data tends to be standardized over a wide time window, it would be prudent to conclude that our mental health datasets would be scope limiting. We were in fact correct. It was challenging to get standardized medical data over a long-time horizon, and thus challenging to uncover trends, but we did find patterns within our analysis. Our core dataset which we utilized alongside Alberta economic data came from the Canadian Community Health Survey (CCHS). Each year the CCHS works alongside Statistics Canada to gather health-related data at provincial levels of geography. The data lent itself to a subgroup analysis that we chose to look at alongside Alberta economic data that was available on the Alberta Economic Dashboard. While The Statistics Canada Database served as our core dataset, we also intended to utilize the Mental Health and Addictions Hospitalizations in Canada Supplementary Tables which are released annually by the Canadian Institute for Health Information (CIHI). Within the CIHI data, we utilized the ‘discharges’ and ‘discharge rates’ for mental health disorders, the data was segregated by province/territory. One hardship we came across while conducting our analysis with the CIHI data was that it was only standardized and reported consistently back to 2017. There are no aggregated datasets on the CIHI database, instead there is a report uploaded annually. This created extra wranging work and left us to work with a shorter timeframe. Despite this we still came to interesting conclusions that answered our original questions.
Our analysis will look to capture the essence of economic reality which exists within municipalities and the wider province, and overlay that theme with mental health related data. As a way of proceeding, we will clearly lay out our three guiding questions as follows:
The following data sources were utilized in our analysis. Please note that all data sources are considered open source and have been made public by government agencies, therefore we have legal permission to utilize the data. We abided by any attribution requirements we came across while conducting our analysis. Please note the filters on the Statistics Canada databases have been saved and provided below.
This is a large file and may be somewhat challenging to navigate... In order to ensure we have answered the questions in a CONCISE manner...
#import libraries
import pandas as pd
import matplotlib.pyplot as plt
import random
import numpy as np
import math
import datetime
import seaborn as sns
import os
import plotly
import plotly.express as px
from plotly.subplots import make_subplots
import ipywidgets as widgets
from ipywidgets import interact
import plotly.graph_objs as go
from sklearn import preprocessing
plotly.offline.init_notebook_mode()
# The following URL is used to call that data from this github repositry
githubRepo = 'https://raw.githubusercontent.com/Jonpeters3/601_Project/main/'
#make directory for images
if not os.path.exists("images"):
os.mkdir("images")
The original dataset that was provided by Stats Canada was a simple table with indicators such as 'perceived mental health' broken into either fair/poor or good/excellent. The dataset also contains the percentage change in respondants who responded that way on the survey. However, due the current structure, the data needed to be wrangled in order to provide a year-to-year change across the rows, with our indicators making up the columns. This required the use of a transpose as well as some trimming and merging in order to make the data table was usable for the necessary analysis.
main_MH =pd.read_csv(githubRepo + "CanadaMentalHealth.csv")
main_MH["Title"] = main_MH["Ag group"] + main_MH["Indicators"] # create groups based on age and indicators to simplyify data
main_MH.drop(columns = ['Ag group', 'Indicators'], inplace = True)
#rotating the dataframe was the simplest solution for getting the infomration where we needed it.
main_MH = main_MH.transpose()
main_MH.rename(columns=dict(main_MH.iloc[-1,:]), inplace = True)
main_MH.drop(main_MH.tail(1).index,inplace=True)
#Here we are basically timming off the bottom portion of the dataframe and joining them as new columns to the dataset
tempDF = main_MH.iloc[6:12,]
tempDF.index = range(2015,2021)
#The new columns are the % change for each age group, the notation '%age(+ or -)' for positive or negative perceptions
tempDF.columns=['%Total+', '%Total-','%1217+', '%1217-','%1834+', '%1834-','%3549+', '%3549-', '%5064+', '%5064-', '%65+', '%65-']
main_MH = main_MH.drop(main_MH.tail(6).index)
main_MH['year'] = main_MH.index
tempDF['year'] = tempDF.index
tempDF['year'] = tempDF['year'].astype('int64')
main_MH['year'] = main_MH['year'].astype('int64')
main_MH = main_MH.merge(tempDF, on='year')
main_MH.index = range(2015,2021)
main_MH.head(4)
| Total, 12 yars and ovrPrcivd mntal halth, vry good or xcllnt 6 | Total, 12 yars and ovrPrcivd mntal halth, fair or poor 6 | 12 to 17 yarsPrcivd mntal halth, vry good or xcllnt 6 | 12 to 17 yarsPrcivd mntal halth, fair or poor 6 | 18 to 34 yarsPrcivd mntal halth, vry good or xcllnt 6 | 18 to 34 yarsPrcivd mntal halth, fair or poor 6 | 35 to 49 yarsPrcivd mntal halth, vry good or xcllnt 6 | 35 to 49 yarsPrcivd mntal halth, fair or poor 6 | 50 to 64 yarsPrcivd mntal halth, vry good or xcllnt 6 | 50 to 64 yarsPrcivd mntal halth, fair or poor 6 | ... | %1217+ | %1217- | %1834+ | %1834- | %3549+ | %3549- | %5064+ | %5064- | %65+ | %65- | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2015 | 2555000 | 210100 | 203000 | 8900 | 782600 | 70500 | 644900 | 54400 | 612000 | 55000 | ... | 78.8 | 3.4 | 72.9 | 6.6 | 73.5 | 6.2 | 77.9 | 7.0 | 72.8 | 5.0 |
| 2016 | 2507000 | 246700 | 204100 | 12300 | 760500 | 84900 | 667300 | 52700 | 560000 | 70700 | ... | 79.3 | 4.8 | 71.4 | 8.0 | 74.6 | 5.9 | 70.9 | 8.9 | 71.6 | 5.9 |
| 2017 | 2430700 | 245600 | 206700 | 15200 | 695400 | 96300 | 646500 | 54200 | 553300 | 57600 | ... | 77.4 | 5.7 | 65.4 | 9.1 | 71.3 | 6.0 | 70.1 | 7.3 | 71.4 | 4.8 |
| 2018 | 2431600 | 275200 | 201700 | 23200 | 682700 | 104700 | 642000 | 74800 | 555700 | 51500 | ... | 73.8 | 8.5 | 65.8 | 10.1 | 69.1 | 8.0 | 70.1 | 6.5 | 71.0 | 4.3 |
4 rows × 25 columns
When looking for additional data and information to accompany the above dataset, we managed to come across a more accurate data set that provided more information already provided in a usable 'database download' format. Likewise, this data set provided more information that provided a deeper understanding of current mental health makeup such as 'Perceived health' and 'Perceived life stress'. This data is broken up into two parts, self reported mental health status for the whole country and for the province of Alberta. With this in mind, it was decided that the below dataset would replace the previously used dataset and become the 'main' mental health dataset. The previous dataset is kept within as it is still useful information that is usable later on within the report, such as the precent change in perceived mental health. Reported below is the available mental health indicators for both Alberta and Canada as a whole so that there an idea what is being reported and what is usable in identifying key trends.
ABreportedMH = pd.read_csv(githubRepo + "StatCan%20Mental%20Health%20Data/Alberta_StatCan_MentalHealthData_updated.csv") #read in data from github repo
CAreportedMH = pd.read_csv(githubRepo + "StatCan%20Mental%20Health%20Data/Canada_StatCan_MentalHealthData_updated.csv") #read in data from github repo
#Data is broken up into various subgroups (like male and female)
# at this point, we are only interested in the mental health indicators
# We group by these indicators, then drop other unneccesaary information
AB_MH_graph = ABreportedMH.groupby(['REF_DATE', 'Indicators', ]).sum().drop(['UOM_ID', 'SCALAR_ID','SYMBOL', 'TERMINATED', 'DECIMALS'], axis = 1)
CA_MH_graph = ABreportedMH.groupby(['REF_DATE', 'Indicators', ]).sum().drop(['UOM_ID', 'SCALAR_ID','SYMBOL', 'TERMINATED', 'DECIMALS'], axis = 1)
AB_MH_graph.reset_index(inplace=True)
CA_MH_graph.reset_index(inplace=True)
display(AB_MH_graph['Indicators'].unique()) #display of AB mental health survey responses that remain within the AB_MH_graph dataframe
display(CA_MH_graph['Indicators'].unique()) #display of CA mental health survey responses that remain within the CA_MH_graph dataframe
array(['Contact with a medical doctor in the past 12 months',
'Current smoker, daily', 'Current smoker, daily or occasional',
'Heavy drinking', 'Life satisfaction, satisfied or very satisfied',
'Mood disorder', 'Perceived health, fair or poor',
'Perceived health, very good or excellent',
'Perceived life stress, most days quite a bit or extremely stressful',
'Perceived mental health, fair or poor',
'Perceived mental health, very good or excellent',
'Sense of belonging to local community, somewhat strong or very strong'],
dtype=object)
array(['Contact with a medical doctor in the past 12 months',
'Current smoker, daily', 'Current smoker, daily or occasional',
'Heavy drinking', 'Life satisfaction, satisfied or very satisfied',
'Mood disorder', 'Perceived health, fair or poor',
'Perceived health, very good or excellent',
'Perceived life stress, most days quite a bit or extremely stressful',
'Perceived mental health, fair or poor',
'Perceived mental health, very good or excellent',
'Sense of belonging to local community, somewhat strong or very strong'],
dtype=object)
Below is the Alberta Mental Health dataframe which is used to compare mental health data to the Alberta Activity Index (which is discussed below)
AB_MH_graph.head(4)
| REF_DATE | Indicators | VALUE | |
|---|---|---|---|
| 0 | 2015 | Contact with a medical doctor in the past 12 m... | 2493538.7 |
| 1 | 2015 | Current smoker, daily | 466412.1 |
| 2 | 2015 | Current smoker, daily or occasional | 632848.7 |
| 3 | 2015 | Heavy drinking | 638442.4 |
Below is the Canada Mental Health dataframe which is used to compare mental health data to the Alberta Activity Index (which is discussed below)
CA_MH_graph.head(4)
| REF_DATE | Indicators | VALUE | |
|---|---|---|---|
| 0 | 2015 | Contact with a medical doctor in the past 12 m... | 2493538.7 |
| 1 | 2015 | Current smoker, daily | 466412.1 |
| 2 | 2015 | Current smoker, daily or occasional | 632848.7 |
| 3 | 2015 | Heavy drinking | 638442.4 |
With the goal of understanding the general economic trends within Alberta overall, data provided by the Government of Alberta was utilized and aggregated on a year-to-year basis in order to compare similar trends to the above mental health data. The Alberta Activity Index is a monthly weighted average of 9 different indicators (employment, average weekly earnings, retail trade, wholesale trade, manufacturing, new truck sales, housing starts, rigs drilling and oil production). The Alberta Activity Index (AAX) is a the most well-balanced index that we could find that we felt could closely track economic activity. The AAX was developed by Alberta Treasury and Finance Board. This data file is updated monthly. See the related tab for a link to the AAX web page which includes highlights and a chart. This data provides a reference for very general economic trends within Alberta, but deeper economic impacts will be explored later in this report. As can be seen in the following block of code, the data is wrangled and filtered to only include the time window which corresponds to the mental health data set described above (years 2015 - 2020).
AA = pd.read_csv(githubRepo + "ActivityIndex.csv")
AA.dropna(axis=1, inplace=True) # due to an issue, there is a lot of nan columns produced, we remove them here
AA['year'] = pd.DatetimeIndex(AA['Date']).year # add year/date information
AA['month'] = pd.DatetimeIndex(AA['Date']).month # add month/date information
AA.columns=['date', 'AA', 'year', 'month']
AA = AA[AA['year'] >= 2015] #subset based on years provided in the mental health dataset
AA = AA[AA['year'] < 2021]
AA.groupby('year', as_index=False).aggregate('mean').drop(['month'], axis = 1) #aggregate mean, for viewing purposes only
AA['date2'] = AA['month'].astype(str) + '/' + AA['year'].astype(str)
Test = px.line(AA, x="date2", y="AA", title='Alberta Economic Index',
labels={"date2": "Date (Years)","AA": "Economic Index"})
Test.update_layout(height=500, width=500)
Test.update_xaxes(nticks=6, tickangle=45)
Test.show()
Test.write_image("images/AlbertaIndex.png")
As can be seen above, the Alberta Economic Index starts with a dip in 2015 following the collapse of oil and gas prices. However, the economy recovered in later years, but saw gradual decline between 2018 and 2019, with a large dip in 2020 following the COVID-19 pandemic.
To compare these trends to those of general mental health trends, a 2x2 grid of line graphs was created to represent overall trends in mental health and the economy. Specifically used to showcase mental health trends are the aforementioned survey results from the Canadian Community Health Survey (CCHS) which measured categories such as 'Perceived life stress,' 'Perceived mental health, 'sense of belonging,' 'heavy drinking,' and 'mood disorders.' As mentioned above, we loaded in dataframes for both Canada as a whole, as well as Alberta in an effort to highlight provincial differences from the national averages. We refrained from displaying national data while answering question one, as question one in particular has a sole focus on Alberta.
Stackedfig = make_subplots(rows=2, cols=2)
# in an effort to save space,this function adds a trace to the graph and have the option to
# include the trace on the legend or not, this allows us to create multiple traces without rewriting the add_trace function.
def addTraceMH(df, ind, r = 1, c = 1, col = 'red', sl=True):
if sl == True:
MHPlot.append_trace(go.Scatter(
x=df[df["Indicators"] == ind]['REF_DATE'],
y=df[df["Indicators"] == ind]['VALUE'],
name = ind, legendgroup=ind, marker=dict(color=col)), row=r, col=c)
else:
MHPlot.append_trace(go.Scatter(
x=df[df["Indicators"] == ind]['REF_DATE'],
y=df[df["Indicators"] == ind]['VALUE'],
name = ind, legendgroup=ind, marker=dict(color=col), showlegend = False), row=r, col=c)
#setting up the bar bones of the graph
MHPlot = make_subplots(rows=2, cols=2,
x_title='Date (Years)',
y_title='Number of Reports',
subplot_titles=("Alberta Positive Mental Health",
"Canada Positive Mental Health",
"Alberta Negative Mental Health",
"Canada Negative Mental Health"))
#Alberta positive indicators
addTraceMH(df = AB_MH_graph, ind = 'Perceived mental health, very good or excellent', r = 1, c = 1, col = 'green')
addTraceMH(df = AB_MH_graph, ind = 'Life satisfaction, satisfied or very satisfied', r = 1, c = 1, col = 'orange')
addTraceMH(df = AB_MH_graph, ind = 'Sense of belonging to local community, somewhat strong or very strong', r = 1, c = 1, col = 'pink')
#Canada positive indicators
addTraceMH(df = CA_MH_graph, ind = 'Perceived mental health, very good or excellent', r = 1, c = 2, col = 'green', sl=False)
addTraceMH(df = CA_MH_graph, ind = 'Life satisfaction, satisfied or very satisfied', r = 1, c = 2, col = 'orange', sl=False)
addTraceMH(df = CA_MH_graph, ind = 'Sense of belonging to local community, somewhat strong or very strong', r = 1, c = 2, col = 'pink', sl=False)
# Alberta Negative indicators
addTraceMH(df = AB_MH_graph, ind = 'Perceived life stress, most days quite a bit or extremely stressful', r = 2, c = 1, col = 'red')
addTraceMH(df = AB_MH_graph, ind = 'Perceived mental health, fair or poor', r = 2, c = 1, col = 'blue')
addTraceMH(df = AB_MH_graph, ind = 'Mood disorder', r = 2, c = 1, col = 'black')
addTraceMH(df = AB_MH_graph, ind = 'Heavy drinking', r = 2, c = 1, col = 'grey')
# Canada Negative indicators
addTraceMH(df = CA_MH_graph, ind = 'Perceived life stress, most days quite a bit or extremely stressful', r = 2, c = 2, col = 'red', sl=False)
addTraceMH(df = CA_MH_graph, ind = 'Perceived mental health, fair or poor', r = 2, c = 2, col = 'blue', sl=False)
addTraceMH(df = CA_MH_graph, ind = 'Mood disorder', r = 2, c = 2, col = 'black', sl=False)
addTraceMH(df = CA_MH_graph, ind = 'Heavy drinking', r = 2, c = 2, col = 'grey', sl=False)
# Albert Economic Index
MHPlot.update_layout(legend=dict(
orientation = 'h', xanchor = "center", x = 0.5, y= .575,
bordercolor="Black",
borderwidth=1 ))
MHPlot.update_layout(height=900, width=900, title_text="Mental Health Versus Alberta Economy")
MHPlot.show()
MHPlot.write_image("images/MHvsEC.png")
Alberta's Economy hit a peak in the year 2018, which was followed by a decline in the ladder half of 2019 (this can best been seen in the Alberta Activity Index). The largest dip occured not in 2015/2016 following the collapse of oil and gas, but in 2020 following the beginning of the COVID-19 pandemic, with the index dropping roughly 40 points. Over the same time horizon we see a gradual decrease in the number of people who rated their mental health as 'very good or excellent'. This is significant as the population of Alberta grew steadily in the years from 2014-2017. An interesting finding was that people's "sense of belonging to the local community" was quite resilient in the wake of economic volatility and has persistently risen over time. This steady increase however may be the result of a growing population.
Although correlation does not infer casuation, it is hard to ignore the overall rises in stress and poor mental health reports following the two major economic collapses in Alberta within this past decade. As a hypothesis, that will be addressed in question two, we thought the economic slowdowns would most likely effect those within the working age range of 18-65. This demographic has historically been the most likely to be impacted by dramatic changes in overall economic activity. This is typcially the age range with the most financial responsibility with kids in college, a mortgage, and much more.
Although the Alberta Activity index has been useful in identifying overall trends, the need for more specific information is needed in order to definitively infer any influential factors within the economy. Data taken from Alberta's economic dashboard allowed us to view provincial wide economic indicators over several years. We were interested in being able to visualize the mass amount of information provided in a subsettable format. Below, we used the plotly library in an effort to create interactive graphs that allowed us to view all economic indicator information that we needed in an effort to understand major economic trends throughout various sectors. Because the Alberta Economic Dashboard contained .csv files with a somewhat standardized format, it allowed us to write a for loop to read in and wrangle multiple files all in one step.
## Economic Indicators
# to read the data in from github, we will iterate over these file names to call and read in each file.
filesList = ['Unemployment.csv', 'CattlePrice.csv', 'NetMigration.csv', 'Population.csv', 'Productivity.csv',
'Restaurant.csv', 'VehicleSales.csv', 'WCSOilPrices.csv', 'WeeklyIncome.csv']
econDF = pd.DataFrame(columns=['When','Alberta', 'Sector'])
# Loop through our economic file location and read all csv's and append them to each other
# There are only 3 columns we need, Date, the recorded value ('Alberta'), and subsector name
for files in filesList:
df = pd.read_csv(githubRepo + "Economic/{0}".format(files))
df['When'] = pd.to_datetime(df['When'])
tempDF = df[["When", "Alberta"]] # Get When and Alberta columns
if files != 'Population.csv':
tempDF = tempDF.join(df.iloc[:,2], lsuffix='_left', rsuffix='_right') # Get second column
else:
tempDF = tempDF.assign(NewCol='Population')
tempDF.columns = ["When", "Alberta", "SubSector"]
tempDF = tempDF.assign(Sector=files.split('.')[0])
econDF = econDF.append(tempDF) # append to final df
#Get date infomration for the upcoming graphs
econDF['Month'] = econDF['When'].dt.month
econDF['Day'] = econDF['When'].dt.day
econDF['Year'] = econDF['When'].dt.year
econDF['Sector2'] = econDF['SubSector'].astype(str) + " (" + econDF['Sector'].astype(str) + ")"
On Alberta's economic dashboard, data is broken up between sectors as well as subsectors. The following dashboard helped to identify key events throughout various sectors during 2015-2020. We encourage the reader to load our file on jupyter so they can toggle the drop down menu to see the assortment of granular economic measures which we plotted. The drop down menu includes data items such as Cattle Price, Net Migration, Population, Productivity, Restaurant Sales (discretionary spending), Unemployment, Vehicle Sales (discretionary spending), Western Canadian Select Oil Prices, and Weekly Income. The aim is to find a more granular conclusion on economic factors instead of relying solely on the Alberta Activity Index. By comparing each of these subsectors to the previously identified mental health trends presented above and below, clear relationships can be uncovered (discussed below).
Interactive features for the below dashboard are not available in HTML. To utilize these features, please use jupyter notebook. We "okay-ed" this with the instructor.
graphDF2 = econDF.groupby(['Year', 'Month', 'Sector']).mean().drop('Day', axis = 1)
li = go.Scatter()
fig_line = go.FigureWidget(data=li)
# A list passed to interact() will yield a drop-down list for sectors
@interact(sector = list(econDF['Sector'].unique()))
def update_line(sector):
# subset data based on sector
graphDF2 = econDF[econDF['Sector'] == sector]
graphDF2 = graphDF2.groupby(['Year', 'SubSector']).mean().drop(['Day', 'Month'], axis = 1)
graphDF2.reset_index(inplace = True)
data = graphDF2 #save to a new df in order to avoid df issues when graphing
fig_line.data = [] # clear previous traces within the plot range
#for each sector add a trace for the subsector
for subsector in data['SubSector'].unique():
data1 = data[data['SubSector'] == subsector]
fig_line.add_trace(
go.Scatter(x=data1['Year'],
y=data1['Alberta'],
name = subsector))
# Add titles and lebels
fig_line.update_layout(title_text="Year-to-year Results for {0}".format(sector),
xaxis_title = 'Year',
yaxis_title = 'Reported Activity')
# for the dashboard to work in jupyer, use this call below
# fig_line
# for html markdown, use .show() for them to appear
fig_line.show()
#fig_line.write_image("images/Dash1.png")
Breaking down all the information available, there are 3 key trends that could contribute the most to the economic downturn as well as indications of declining mental health:
Firstly, Oil Prices have been increasing since the early 2000's, with rapid growth around 2008. Rapid increases and decreases in oil prices are a doubled-edged sword on Alberta’s economy. With oil and gas being one of the largest means of production in the province, increases in commodity prices has the potential to make the cost of living unmanageable. During these times, it costs more to fill your car and heat your home. Likewise, the rapid decrease in commodity prices signal the decline in the overall economic situation within Alberta, this is primarily due to the concentration of employment within the oil and gas sector in Alberta. Rapid price drops signal lost jobs and lost wages, and overall, less money and more monetary stress felt at home.
Secondly, around 2013, the price of cattle skyrocketed within the province. This economic subsector is felt directly in the home as the cost of living and food is directly related to the price of meats in the region. Alberta, known for their beef, saw rapid increases in food prices, with prices peaking in 2015. This was an interesting trend, that may have a more distant linkage to mental health trends, but we thought it was worth mentioning.
Lastly, unemployment rates seem to be the most volatile indicator within the Alberta economy, as unemployment rates have fluctuated up and down since the mid-2000's. With peaks in 2009, 2016, and 2020, unemployment seems to follow right behind the drop in key indicators. Unemployment clearly has the largest effect on stress and mental health. Without employment, the fluctuation of key prices creates monetary stress which can lead to mental health difficulties for families in the province.
Visualizing each individual sub-sector on a month-to-month basis will identify how individual subsectors fluctuate with time as well, this dives a level deeper compared to the above line graphs. This allowed us to backtrack to key dates in Alberta's economic history and track how differing sectors changed on a month-to-month basis. The monthly charts also allowed us to visualize any interesting points or disparities that we noticed from the above line plot for economic indicators.
The following interactive bar graph helped to identify key events throughout various sectors during specific months within the key 2015-2020 timeframe. We encourage the reader to load our file on jupyter so they can toggle the drop down menu to see the assortment of granular economic measures which we plotted. The drop down menu includes data items such as Motor Vehicle Sales, Natural Gas Pricing, and Weekly Income Metrics from various subsectors such as food services, waste management, and entertainment.
Interactive features for the below dashboard are not available in HTML. To utilize these features, please use jupyter notebook. We "okay-ed" this with the instructor.
graphDF = econDF.groupby(['Year', 'Month', 'Sector']).mean().drop('Day', axis = 1)
#create our plot
bar = go.Bar()
fig_bar = go.FigureWidget(data=bar)
#used to replace month labels on the x-axis
month_labels = {1:'Jan',2:'Feb',3:'Mar',4:'Apr',5:'May', 6:'Jun', 7:'Jul', 8:'Aug', 9:'Sep', 10:'Oct', 11:'Nov', 12:'Dec'}
# drop down lists for year and sectors
@interact(year=[*range(max(econDF['Year']), min(econDF['Year']),-1)],
sector = list(econDF['Sector2'].unique()))
def update_bar(sector, year=2020):
#create a new df to avoid overlap issues
graphDF = econDF[econDF['Sector2'] == sector]
# use min or max year for subsector if data does not reach that far out.
if year < min(graphDF['Year']):
year = min(graphDF['Year'])
elif year > max(graphDF['Year']):
year = max(graphDF['Year'])
#aggregate data
graphDF = graphDF.groupby(['Year', 'Month']).mean().drop('Day', axis = 1)
data = graphDF.loc[year].mean(axis=1,skipna=True)
#Add bars with series values
fig_bar.update_traces(x=pd.Series(data.index.values).values,
y=data.values)
#Add title and labels
fig_bar.update_layout(title_text="{0} results for {1}*".format(sector, year),
xaxis_title = 'Month',
yaxis_title = 'Reported Activity',
xaxis = dict(
tickmode = 'array',
tickvals = [*month_labels.keys()],
ticktext = [*month_labels.values()]
))
#annotation for limits on years
note = '*If data for the year selected is not available for this subsector, the mininum (or maximum) year available will be used'
fig_bar.add_annotation(
showarrow=False, text=note,
xref='x domain', x=.5,
yref='y domain', y=-.2 )
# for the dashboard to work in jupyer, use this call below
#fig_bar
# for html markdown, use .show() for them to appear
fig_bar.show()
#fig_bar.write_image("images/Dash2.png")
Looking back on the key economic indicators from above, we find that although we saw significant spikes for unemployment in the years 2009 and 2016, there was no key contributing month of the year. Both years saw a steady incline in unemployment through the whole year. Meanwhile, 2020 saw a significant spike in April due to the COVID-19 pandemic. Oil prices saw a dramatic rise in 2009 with prices of WTI almost doubling from 40\$ to 80\\$, with the prices never really dipping below 60\$ after that point. and lastly, we find that the price of cattle slaughter likewise had a gradual gain, peaking in May at 193\\$.
The sudden and rapid changes in oil prices hit Albertans hard, and the mental health suffered due to that fact. Economic indicators directly related to day-to-day living have been found to correlate to the mental health status within the province in those same given years. With this, it is concluded that there exists an identifiable relationship between the dynamic economic situation of the province and the wider mental health trend within Alberta. Now, it is necessary to identify which subpopulations are most directly impacted by changes in the economic status of the province.
Below we utilize the large mental health datasets available from the Canadian Community Health Survey (CCHS) used in the previous question:
#Read in datasets
Canada_main_frame = pd.read_csv(githubRepo + 'StatCan%20Mental%20Health%20Data/Canada_StatCan_MentalHealthData_updated.csv')
Alberta_main_frame = pd.read_csv(githubRepo + 'StatCan%20Mental%20Health%20Data/Alberta_StatCan_MentalHealthData_updated.csv')
The Canadian main dataframe was read in as a csv file in a "database format" that was able to be produced on the Statistics Canada webpage:
Canada_main_frame.head(4)
| REF_DATE | GEO | DGUID | Age group | Sex | Indicators | Characteristics | UOM | UOM_ID | SCALAR_FACTOR | SCALAR_ID | VECTOR | COORDINATE | VALUE | STATUS | SYMBOL | TERMINATED | DECIMALS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015 | Canada (excluding territories) | NaN | 12 to 17 years | Males | Perceived health, very good or excellent | Number of persons | Number | 223 | units | 0 | v110788271 | 1.2.2.1.1 | 812000.0 | NaN | NaN | NaN | 0 |
| 1 | 2016 | Canada (excluding territories) | NaN | 12 to 17 years | Males | Perceived health, very good or excellent | Number of persons | Number | 223 | units | 0 | v110788271 | 1.2.2.1.1 | 841800.0 | NaN | NaN | NaN | 0 |
| 2 | 2017 | Canada (excluding territories) | NaN | 12 to 17 years | Males | Perceived health, very good or excellent | Number of persons | Number | 223 | units | 0 | v110788271 | 1.2.2.1.1 | 866700.0 | NaN | NaN | NaN | 0 |
| 3 | 2018 | Canada (excluding territories) | NaN | 12 to 17 years | Males | Perceived health, very good or excellent | Number of persons | Number | 223 | units | 0 | v110788271 | 1.2.2.1.1 | 836400.0 | NaN | NaN | NaN | 0 |
The Alberta main dataframe was read in as a csv file in a "database format" that was able to be produced on the Statistics Canada webpage:
Alberta_main_frame.head(4)
| REF_DATE | GEO | DGUID | Age group | Sex | Indicators | Characteristics | UOM | UOM_ID | SCALAR_FACTOR | SCALAR_ID | VECTOR | COORDINATE | VALUE | STATUS | SYMBOL | TERMINATED | DECIMALS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015 | Alberta | 2016A000248 | 12 to 17 years | Males | Perceived health, very good or excellent | Number of persons | Number | 223 | units | 0 | v110817755 | 10.2.2.1.1 | 104100.0 | NaN | NaN | NaN | 0 |
| 1 | 2016 | Alberta | 2016A000248 | 12 to 17 years | Males | Perceived health, very good or excellent | Number of persons | Number | 223 | units | 0 | v110817755 | 10.2.2.1.1 | 106300.0 | NaN | NaN | NaN | 0 |
| 2 | 2017 | Alberta | 2016A000248 | 12 to 17 years | Males | Perceived health, very good or excellent | Number of persons | Number | 223 | units | 0 | v110817755 | 10.2.2.1.1 | 109900.0 | NaN | NaN | NaN | 0 |
| 3 | 2018 | Alberta | 2016A000248 | 12 to 17 years | Males | Perceived health, very good or excellent | Number of persons | Number | 223 | units | 0 | v110817755 | 10.2.2.1.1 | 108500.0 | NaN | NaN | NaN | 0 |
In the first portion of the analysis it was useful to cut out unncessary detail so it is possible to create workable sub dataframes. As you can see below, we dropped the columns Characteristics, GEO, DGUID, UOM, UOM_ID, SCALAR_FACTOR, SCALAR_ID, VECTOR, COORDINATE, STATUS, SYMBOL, TERMINATED, and DECIMALS. As well we filtered the data to only include rows that had a value of less than 100. The reason why this was done is because the data held within the Canada_main_frame and Alberta_main_frame included both "values" or counts of responses to the survey, as well as percentages of the total responses (a number between 0-100). In order to compare the data temporally (or across time), it made more sense to utilize the percentage data... this way we could assess if there were any trends in the relative amount of citizens who answered a certain way on the survey. Following the filter that was applied to the value column, we filtered values that included the string "Percieved" within the data, this was the cleanest way of selecting only the survey items titled "Percieved health, fair or poor", "Perceived health, very good or excellent", "Percieved life stress, most days quite a bit or extremely stressful", "Perceived mental health, fair or poor", and "Perceived mental health, fair or excellent".
#drop columns to make it more manageable
Canada_main_frame_cut = Canada_main_frame.drop(columns=['Characteristics',
'GEO',
'DGUID',
'UOM',
'UOM_ID',
'SCALAR_FACTOR',
'SCALAR_ID',
'VECTOR',
'COORDINATE',
'STATUS',
'SYMBOL',
'TERMINATED',
'DECIMALS'],axis=1)
#filter to only include percentages and only include values under 100 (which would encompass percentage values only based on the .csv file)
Canada_main_frame_cut = Canada_main_frame_cut[(Canada_main_frame_cut["VALUE"] < 100)]
#filter to only include specific indicators (percieved health categories) as reported in the CCHS survey
Canada_main_frame_cut = Canada_main_frame_cut[Canada_main_frame_cut['Indicators'].str.contains("Perceived")]
We implemented the same wrangling measures on the Alberta main dataframe as the Canada main dataframe
#drop columns to make it more manageable
Alberta_main_frame_cut = Alberta_main_frame.drop(columns=['Characteristics',
'GEO',
'DGUID',
'UOM',
'UOM_ID',
'SCALAR_FACTOR',
'SCALAR_ID',
'VECTOR',
'COORDINATE',
'STATUS',
'SYMBOL',
'TERMINATED',
'DECIMALS'],axis=1)
#filter to only include percentages and only include values under 100 (which would encompass percentage values only based on the .csv file)
Alberta_main_frame_cut = Alberta_main_frame_cut[(Alberta_main_frame_cut["VALUE"] < 100)]
#filter to only include specific indicators (percieved health categories) as reported in the CCHS survey
Alberta_main_frame_cut = Alberta_main_frame_cut[Alberta_main_frame_cut['Indicators'].str.contains("Perceived")]
In order to adequately answer our second question...
It was decided to create 10 "sub-dataframes" within both the Canada main dataframe and 8 "sub-dataframes" within the Alberta main dataframe... which contains the percentage of total respondants who responded that way on the survey for each subcategory of age and sex. The following "sub-dataframes" were utilized to create heatmaps using the Seaborn library. This allowed us to more clearly analyze the trend in mental health over the past six years in both Canada and Alberta. The reason why we did not included data for ages 12-17 within Alberta, was because the dataset was so small it was noted as being estimated by statistics canada. We felt that the population aged 12 - 17 would not be the focal point of our subgroup analysis, so we chose to only include the full dataset which was not estimated for the entire country of Canada.
The following sub-dataframes were created for Canada:
1) FEMALE 12 - 17
2) MALE 12 - 17
3) FEMALE 18 - 34
4) MALE 18 - 34
5) FEMALE 35 - 49
6) MALE 35 - 49
7) FEMALE 50 - 64
8) MALE 50 - 64
9) FEMALE 65+
10) MALE 65+
The following sub-dataframes were created for Alberta:
1) FEMALE 18 - 34
2) MALE 18 - 34
3) FEMALE 35 - 49
4) MALE 35 - 49
5) FEMALE 50 - 64
6) MALE 50 - 64
7) FEMALE 65+
8) MALE 65+
In order to create the sub-dataframes which originate from the Canada_main_frame, we filter out rows by utilzing the attributes "Age Group" and "Sex". In order to filter out aged 12-17 females for example, we return a dataframe that only has values of '12 to 17 years' for the attribute "Age Group" and a value of "Females" for the attribute "Sex". Following this filtering method, we decided to utilize the pivot table function to display the value or percentage for each "Indicator". The pivot table reshapes data or produces a “pivot” table based on column values and uses unique values from specified index / columns to form axes of the resulting DataFrame. Following that we decided to transpose the dataframe so that it was more easily auditable and in the same alignment as the seaborn heat maps that we created by utilizing this data. The following block of code is for females within the Canada main dataframe
#Canada Females
#FEMALE AGED 12-17
Canada_12to17F = Canada_main_frame_cut[(Canada_main_frame_cut["Age group"] == '12 to 17 years') & (Canada_main_frame_cut["Sex"] == 'Females')] #filter to only include Age Group 12-17 years and Female sex
Canada_12to17F = Canada_12to17F.pivot_table('VALUE', ['REF_DATE'], 'Indicators') #rotate table so it is more easily auditable
Canada_12to17F = Canada_12to17F.transpose() #final chart input
#FEMALE AGED 18-34
Canada_18to34F = Canada_main_frame_cut[(Canada_main_frame_cut["Age group"] == '18 to 34 years') & (Canada_main_frame_cut["Sex"] == 'Females')] #filter to only include Age Group 18-34 years and Female sex
Canada_18to34F = Canada_18to34F.pivot_table('VALUE', ['REF_DATE'], 'Indicators') #rotate table so it is more easily auditable
Canada_18to34F = Canada_18to34F.transpose() #final chart input
#FEMALE AGED 35-49
Canada_35to49F = Canada_main_frame_cut[(Canada_main_frame_cut["Age group"] == '35 to 49 years') & (Canada_main_frame_cut["Sex"] == 'Females')] #filter to only include Age Group 35-49 years and Female sex
Canada_35to49F = Canada_35to49F.pivot_table('VALUE', ['REF_DATE'], 'Indicators') #rotate table so it is more easily auditable
Canada_35to49F = Canada_35to49F.transpose() #final chart input
#FEMALE AGED 50-64
Canada_50to64F = Canada_main_frame_cut[(Canada_main_frame_cut["Age group"] == '50 to 64 years') & (Canada_main_frame_cut["Sex"] == 'Females')] #filter to only include Age Group 50-64 years and Female sex
Canada_50to64F = Canada_50to64F.pivot_table('VALUE', ['REF_DATE'], 'Indicators') #rotate table so it is more easily auditable
Canada_50to64F = Canada_50to64F.transpose() #final chart input
#FEMALE AGED 65+
Canada_65plusF = Canada_main_frame_cut[(Canada_main_frame_cut["Age group"] == '65 years and over') & (Canada_main_frame_cut["Sex"] == 'Females')] #filter to only include Age Group 65+ years and Female sex
Canada_65plusF = Canada_65plusF.pivot_table('VALUE', ['REF_DATE'], 'Indicators') #rotate table so it is more easily auditable
Canada_65plusF = Canada_65plusF.transpose() #final chart input
The methods applied for the males within the Canada main dataframe is the exact same as the method utilized for females... filter on age and sex, followed by the pivot table function, and a transpose...
#Canada Males
#MALE AGED 12-17
Canada_12to17M = Canada_main_frame_cut[(Canada_main_frame_cut["Age group"] == '12 to 17 years') & (Canada_main_frame_cut["Sex"] == 'Males')] #filter to only include Age Group 12-17 years and Male sex
Canada_12to17M = Canada_12to17M.pivot_table('VALUE', ['REF_DATE'], 'Indicators') #rotate table so it is more easily auditable
Canada_12to17M = Canada_12to17M.transpose() #final chart input
#MALE AGED 18-34
Canada_18to34M = Canada_main_frame_cut[(Canada_main_frame_cut["Age group"] == '18 to 34 years') & (Canada_main_frame_cut["Sex"] == 'Males')] #filter to only include Age Group 18-34 years and Male sex
Canada_18to34M = Canada_18to34M.pivot_table('VALUE', ['REF_DATE'], 'Indicators') #rotate table so it is more easily auditable
Canada_18to34M = Canada_18to34M.transpose() #final chart input
#MALE AGED 35-49
Canada_35to49M = Canada_main_frame_cut[(Canada_main_frame_cut["Age group"] == '35 to 49 years') & (Canada_main_frame_cut["Sex"] == 'Males')] #filter to only include Age Group 35-49 years and Male sex
Canada_35to49M = Canada_35to49M.pivot_table('VALUE', ['REF_DATE'], 'Indicators') #rotate table so it is more easily auditable
Canada_35to49M = Canada_35to49M.transpose() #final chart input
#MALE AGED 50-64
Canada_50to64M = Canada_main_frame_cut[(Canada_main_frame_cut["Age group"] == '50 to 64 years') & (Canada_main_frame_cut["Sex"] == 'Males')] #filter to only include Age Group 50-64 years and Male sex
Canada_50to64M = Canada_50to64M.pivot_table('VALUE', ['REF_DATE'], 'Indicators') #rotate table so it is more easily auditable
Canada_50to64M = Canada_50to64M.transpose() #final chart input
#MALE AGED 65+
Canada_65plusM = Canada_main_frame_cut[(Canada_main_frame_cut["Age group"] == '65 years and over') & (Canada_main_frame_cut["Sex"] == 'Males')] #filter to only include Age Group 65+ years and Male sex
Canada_65plusM = Canada_65plusM.pivot_table('VALUE', ['REF_DATE'], 'Indicators') #rotate table so it is more easily auditable
Canada_65plusM = Canada_65plusM.transpose() #final chart input
The methods applied for the females within the Alberta main dataframe is the exact same as the method utilized for the Canadian male and Canadian female sub-dataframes... filter on age and sex, followed by the pivot table function, and a transpose...
#Alberta Females
#FEMALE AGED 18-34
Alberta_18to34F = Alberta_main_frame_cut[(Alberta_main_frame_cut["Age group"] == '18 to 34 years') & (Alberta_main_frame_cut["Sex"] == 'Females')] #filter to only include Age Group 18-34 years and Female sex
Alberta_18to34F = Alberta_18to34F.pivot_table('VALUE', ['REF_DATE'], 'Indicators') #rotate table so it is more easily auditable
Alberta_18to34F = Alberta_18to34F.transpose() #final chart input
#FEMALE AGED 35-49
Alberta_35to49F = Alberta_main_frame_cut[(Alberta_main_frame_cut["Age group"] == '35 to 49 years') & (Alberta_main_frame_cut["Sex"] == 'Females')] #filter to only include Age Group 35-49 years and Female sex
Alberta_35to49F = Alberta_35to49F.pivot_table('VALUE', ['REF_DATE'], 'Indicators') #rotate table so it is more easily auditable
Alberta_35to49F = Alberta_35to49F.transpose() #final chart input
#FEMALE AGED 50-64
Alberta_50to64F = Alberta_main_frame_cut[(Alberta_main_frame_cut["Age group"] == '50 to 64 years') & (Alberta_main_frame_cut["Sex"] == 'Females')] #filter to only include Age Group 50-64 years and Female sex
Alberta_50to64F = Alberta_50to64F.pivot_table('VALUE', ['REF_DATE'], 'Indicators') #rotate table so it is more easily auditable
Alberta_50to64F = Alberta_50to64F.transpose() #final chart input
#FEMALE AGED 65+
Alberta_65plusF = Alberta_main_frame_cut[(Alberta_main_frame_cut["Age group"] == '65 years and over') & (Alberta_main_frame_cut["Sex"] == 'Females')] #filter to only include Age Group 65+ years and Female sex
Alberta_65plusF = Alberta_65plusF.pivot_table('VALUE', ['REF_DATE'], 'Indicators') #rotate table so it is more easily auditable
Alberta_65plusF = Alberta_65plusF.transpose() #final chart input
The methods applied for the males within the Alberta main dataframe is the exact same as the method utilized for the Canadian male and Canadian female sub-dataframes... filter on age and sex, followed by the pivot table function, and a transpose...
#Alberta Males
#MALE AGED 18-34
Alberta_18to34M = Alberta_main_frame_cut[(Alberta_main_frame_cut["Age group"] == '18 to 34 years') & (Alberta_main_frame_cut["Sex"] == 'Males')] #filter to only include Age Group 18-34 years and Male sex
Alberta_18to34M = Alberta_18to34M.pivot_table('VALUE', ['REF_DATE'], 'Indicators') #rotate table so it is more easily auditable
Alberta_18to34M = Alberta_18to34M.transpose() #final chart input
#MALE AGED 35-49
Alberta_35to49M = Alberta_main_frame_cut[(Alberta_main_frame_cut["Age group"] == '35 to 49 years') & (Alberta_main_frame_cut["Sex"] == 'Males')] #filter to only include Age Group 35-49 years and Male sex
Alberta_35to49M = Alberta_35to49M.pivot_table('VALUE', ['REF_DATE'], 'Indicators') #rotate table so it is more easily auditable
Alberta_35to49M = Alberta_35to49M.transpose() #final chart input
#MALE AGED 50-64
Alberta_50to64M = Alberta_main_frame_cut[(Alberta_main_frame_cut["Age group"] == '50 to 64 years') & (Alberta_main_frame_cut["Sex"] == 'Males')] #filter to only include Age Group 50-64 years and Male sex
Alberta_50to64M = Alberta_50to64M.pivot_table('VALUE', ['REF_DATE'], 'Indicators') #rotate table so it is more easily auditable
Alberta_50to64M = Alberta_50to64M.transpose() #final chart input
#MALE AGED 65+
Alberta_65plusM = Alberta_main_frame_cut[(Alberta_main_frame_cut["Age group"] == '65 years and over') & (Alberta_main_frame_cut["Sex"] == 'Males')] #filter to only include Age Group 65+ years and Male sex
Alberta_65plusM = Alberta_65plusM.pivot_table('VALUE', ['REF_DATE'], 'Indicators') #rotate table so it is more easily auditable
Alberta_65plusM = Alberta_65plusM.transpose() #final chart input
The following sub-dataframe appears as follows after an index reset, you can see that the attribute of Indicators has the following five survey responses: "Percieved health, fair or poor", "Perceived health, very good or excellent", "Percieved life stress, most days quite a bit or extremely stressful", "Perceived mental health, fair or poor", and "Perceived mental health, fair or excellent"
Alberta_65plusM.reset_index()
| REF_DATE | Indicators | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 |
|---|---|---|---|---|---|---|---|
| 0 | Perceived health, fair or poor | 15.7 | 21.8 | 22.1 | 21.9 | 18.3 | 19.0 |
| 1 | Perceived health, very good or excellent | 56.6 | 43.6 | 46.3 | 45.6 | 48.8 | 50.5 |
| 2 | Perceived life stress, most days quite a bit o... | 9.4 | 13.7 | 13.6 | 10.6 | 9.8 | 9.2 |
| 3 | Perceived mental health, fair or poor | 2.5 | 6.4 | 6.1 | 3.8 | 5.5 | 5.5 |
| 4 | Perceived mental health, very good or excellent | 75.8 | 72.2 | 67.3 | 71.0 | 68.7 | 67.4 |
In order to adequately assess our second question and figure out what subpopulations had changing mental health, we utilized the percentage data within the survey. This data represents the percentage of survey respondants that provided the given answer on the survey. We created heat maps for for the subpopulations of age and gender in both Canada and Alberta. We felt it would be beneficial to compare these heatmaps to eachother. This helped us determine if the changing mental health picture in Alberta was independent of the wider economic situation in Canada, or if it was the result of the broader Canadian economy. In general, heatmaps are used to show relationships between two variables, one plotted on each axis. By observing how cell colors change across each axis, you can observe if there are any patterns in values for one or both variables. In our case the x-axis is time and the y-axis is the survey response. Plotting this data as a heatmap allowed us to more easily identify and visualize the differences or changes in each category over time, as well as the differences between age, and gender.
#the following block of code creates a heatmap for ALL OF CANADA for the following sub-dataframes
#The following sub-dataframes were created for Canada:
# 1) FEMALE 12 - 17
# 2) MALE 12 - 17
# 3) FEMALE 18 - 34
# 4) MALE 18 - 34
# 5) FEMALE 35 - 49
# 6) MALE 35 - 49
# 7) FEMALE 50 - 64
# 8) MALE 50 - 64
# 9) FEMALE 65+
# 10) MALE 65+
plt.figure(figsize=(10,25)) #set the size of the heatmap in aggregate
plt.suptitle('Distribtion of Mental Health Indicators within CANADA', fontsize=16, y=0.91)
#note that the formatting comments for subplot 1 are the same across all subplots and thus have not been repeated
plt.subplot(5,2,1) #the subplot specifies the nature of the grid which subplots will be placed, in our case the grid is in a 2 column by 5 row setup
ax = sns.heatmap(Canada_12to17F, cmap="YlGnBu", cbar=0, annot=True,fmt='.1f', square=True, linewidths=4.0, linecolor='white') #plot heatmap using seaborn and make asthetic changes and include annotations with 1 decimal point
ax.set(xlabel=None, ylabel=None, title='Female (Age: 12 - 17)') #set the y label to 'none' as the categorical survey responses are adequate, provide a title to ensure accurate comparisons, x label set to 'none' as years are self explanatory
plt.subplot(5,2,2)
ax2 = sns.heatmap(Canada_12to17M, cmap="YlGnBu", cbar_kws={'label': 'Percentage of Sample', 'orientation':'vertical', 'fraction':0.04}, annot=True,fmt='.1f', square=True, linewidths=4.0, linecolor='white', yticklabels=False)
ax2.set(xlabel=None, ylabel=None, title='Male (Age: 12 - 17)')
plt.subplot(5,2,3)
ax3 = sns.heatmap(Canada_18to34F, cmap="YlGnBu", cbar=0, annot=True,fmt='.1f', square=True, linewidths=4.0, linecolor='white')
ax3.set(xlabel=None, ylabel=None, title='Female (Age: 18 - 34)')
plt.subplot(5,2,4)
ax4 = sns.heatmap(Canada_18to34M, cmap="YlGnBu", cbar_kws={'label': 'Percentage of Sample', 'orientation':'vertical', 'fraction':0.04}, annot=True,fmt='.1f', square=True, linewidths=4.0, linecolor='white', yticklabels=False)
ax4.set(xlabel=None, ylabel=None, title='Male (Age: 18 - 34)')
plt.subplot(5,2,5)
ax5 = sns.heatmap(Canada_35to49F, cmap="YlGnBu", cbar=0, annot=True,fmt='.1f', square=True, linewidths=4.0, linecolor='white')
ax5.set(xlabel=None, ylabel=None, title='Female (Age: 35 - 49)')
plt.subplot(5,2,6)
ax6 = sns.heatmap(Canada_35to49M, cmap="YlGnBu", cbar_kws={'label': 'Percentage of Sample', 'orientation':'vertical', 'fraction':0.04}, annot=True,fmt='.1f', square=True, linewidths=4.0, linecolor='white', yticklabels=False)
ax6.set(xlabel=None, ylabel=None, title='Male (Age: 35 - 49)')
plt.subplot(5,2,7)
ax7 = sns.heatmap(Canada_50to64F, cmap="YlGnBu", cbar=0, annot=True,fmt='.1f', square=True, linewidths=4.0, linecolor='white')
ax7.set(xlabel=None, ylabel=None, title='Female (Age: 50 - 64)')
plt.subplot(5,2,8)
ax8 = sns.heatmap(Canada_50to64M, cmap="YlGnBu", cbar_kws={'label': 'Percentage of Sample', 'orientation':'vertical', 'fraction':0.04}, annot=True,fmt='.1f', square=True, linewidths=4.0, linecolor='white', yticklabels=False)
ax8.set(xlabel=None, ylabel=None, title='Male (Age: 50 - 64)')
plt.subplot(5,2,9)
ax9 = sns.heatmap(Canada_65plusF, cmap="YlGnBu", cbar=0, annot=True,fmt='.1f', square=True, linewidths=4.0, linecolor='white')
ax9.set(xlabel=None, ylabel=None, title='Female (Age: 65+)')
plt.subplot(5,2,10)
ax10 = sns.heatmap(Canada_65plusM, cmap="YlGnBu", cbar_kws={'label': 'Percentage of Sample', 'orientation':'vertical', 'fraction':0.04}, annot=True,fmt='.1f', square=True, linewidths=4.0, linecolor='white', yticklabels=False)
ax10.set(xlabel=None, ylabel=None, title='Male (Age: 65+)')
plt.subplots_adjust(wspace=.001, hspace= 0.4)
plt.savefig('images/CanBreak.png')
#the following block of code creates a heatmap for ALBERTA for the following sub-dataframes
#The following sub-dataframes were created for Alberta:
# 1) FEMALE 18 - 34
# 4) MALE 18 - 34
# 5) FEMALE 35 - 49
# 6) MALE 35 - 49
# 7) FEMALE 50 - 64
# 8) MALE 50 - 64
# 9) FEMALE 65+
# 10) MALE 65+
plt.figure(figsize=(10,25)) #set the size of the heatmap in aggregate
plt.suptitle('Distribtion of Mental Health Indicators within ALBERTA', fontsize=16, y=0.91)
#note that the formatting comments for subplot 1 are the same across all subplots and thus have not been repeated
plt.subplot(5,2,1) #the subplot specifies the nature of the grid which subplots will be placed, in our case the grid is in a 2 column by 5 row setup
ax1 = sns.heatmap(Alberta_18to34F, cmap="YlGnBu", cbar=0, annot=True,fmt='.1f', square=True, linewidths=4.0, linecolor='white') #plot heatmap using seaborn and make asthetic changes and include annotations with 1 decimal point
ax1.set(xlabel=None, ylabel=None, title='Female (Age: 18 - 34)') #set the y label to 'none' as the categorical survey responses are adequate, provide a title to ensure accurate comparisons, x label set to 'none' as years are self explanatory
plt.subplot(5,2,2)
ax2 = sns.heatmap(Alberta_18to34M, cmap="YlGnBu", cbar_kws={'label': 'Percentage of Sample', 'orientation':'vertical', 'fraction':0.04}, annot=True,fmt='.1f', square=True, linewidths=4.0, linecolor='white', yticklabels=False)
ax2.set(xlabel=None, ylabel=None, title='Male (Age: 18 - 34)')
plt.subplot(5,2,3)
ax3 = sns.heatmap(Alberta_35to49F, cmap="YlGnBu", cbar=0, annot=True,fmt='.1f', square=True, linewidths=4.0, linecolor='white')
ax3.set(xlabel=None, ylabel=None, title='Female (Age: 35 - 49)')
plt.subplot(5,2,4)
ax4 = sns.heatmap(Alberta_35to49M, cmap="YlGnBu", cbar_kws={'label': 'Percentage of Sample', 'orientation':'vertical', 'fraction':0.04}, annot=True,fmt='.1f', square=True, linewidths=4.0, linecolor='white', yticklabels=False)
ax4.set(xlabel=None, ylabel=None, title='Male (Age: 35 - 49)')
plt.subplot(5,2,5)
ax5 = sns.heatmap(Alberta_50to64F, cmap="YlGnBu", cbar=0, annot=True,fmt='.1f', square=True, linewidths=4.0, linecolor='white')
ax5.set(xlabel=None, ylabel=None, title='Female (Age: 50 - 64)')
plt.subplot(5,2,6)
ax6 = sns.heatmap(Alberta_50to64M, cmap="YlGnBu", cbar_kws={'label': 'Percentage of Sample', 'orientation':'vertical', 'fraction':0.04}, annot=True,fmt='.1f', square=True, linewidths=4.0, linecolor='white', yticklabels=False)
ax6.set(xlabel=None, ylabel=None, title='Male (Age: 50 - 64)')
plt.subplot(5,2,7)
ax7 = sns.heatmap(Alberta_65plusF, cmap="YlGnBu", cbar=0, annot=True,fmt='.1f', square=True, linewidths=4.0, linecolor='white')
ax7.set(xlabel=None, ylabel=None, title='Female (Age: 65+)')
plt.subplot(5,2,8)
ax8 = sns.heatmap(Alberta_65plusM, cmap="YlGnBu", cbar_kws={'label': 'Percentage of Sample', 'orientation':'vertical', 'fraction':0.04}, annot=True,fmt='.1f', square=True, linewidths=4.0, linecolor='white', yticklabels=False)
ax8.set(xlabel=None, ylabel=None, title='Male (Age: 65+)')
plt.subplots_adjust(wspace=.001, hspace= 0.4)
plt.savefig('images/ABBreak.png')
When analyzing these heatmaps we noticed that percentages perceived life stress within the Alberta heatmaps (especially in the female age 35-49 category) were significantly higher. We found it interesting and perhaps suggestive that economic stress puts the most pressure on females, because they traditionally have taken on more tasks at home and with children, especially when they are of childbearing age. Interestingly, if we look at the same female age group in Canada as a whole, we notice that the percentage of people who suffered from stress in Canada slightly decreased from 2015 - 2020. Also, the number of females (aged 35-49) in Canada who perceived their health as being very good or excellent increased by 2 percent, but in Alberta that same sex and age matched group (females aged 35-49) decreased by roughly 8 precent.
This analysis together with graphs in part one suggest that there is a clear correlation between Alberta's economic indexes and an increase in reported stress. Also, it can be concluded that the general decline in economic activity affected Canadian mental and physical health all over the country, but people living within Alberta suffered more severely relative to their sex and age matched counterparts.
Due to the fact that the quantity of people in every age group varies greatly, we decided to conduct an analysis on the normalized data in order to have an opportunity to confirm our conclusions above. Normalization is the process where each of the data points recorded in the dataset will be transformed into the range between 0 and 1. The normalization formula is as follows:
This procedure will help to approximate the "most impacted" age groups more correctly. Below there are the bar charts for every age group (both sexes are displayed together) with normalised values. The x-axis contains years and the y-axis represents a normalized value for every age group and type of disorder.
#pandas groupby REF_DATE, Age group, Indicators, as well as sex
reportedMH_agg_sex = ABreportedMH.groupby(['REF_DATE', 'Age group', 'Indicators', 'Sex']).sum().drop(['UOM_ID', 'SCALAR_ID','SYMBOL', 'TERMINATED', 'DECIMALS'], axis = 1)
reportedMH_agg_sex.reset_index(inplace=True)
#rename column REF_DATE as year
reportedMH_agg_sex = reportedMH_agg_sex.rename(columns={'REF_DATE': 'Year'})
#filter for mood disorders segregated by sex
mood_disorder_sex = reportedMH_agg_sex[reportedMH_agg_sex['Indicators'] == 'Mood disorder']
#filter for percieved life stress segregated by sex
stress_with_sex = reportedMH_agg_sex[reportedMH_agg_sex['Indicators'] == 'Perceived life stress, most days quite a bit or extremely stressful']
#filter for percieved health, fair or poor segregated by sex
poor_with_sex = reportedMH_agg_sex[reportedMH_agg_sex['Indicators'] == 'Perceived health, fair or poor']
#filter for heavy drinking segregated by sex
heavy_drinking_sex = reportedMH_agg_sex[reportedMH_agg_sex['Indicators'] == 'Heavy drinking']
#filter for percieved mental health, very good or excellent segregated by sex
good_with_sex = reportedMH_agg_sex[reportedMH_agg_sex['Indicators'] == 'Perceived mental health, very good or excellent']
# Get dataframe with selected disorders
options = ['Mood disorder', 'Perceived life stress, most days quite a bit or extremely stressful','Perceived health, fair or poor', 'Heavy drinking' ]
all_disorders = reportedMH_agg_sex.loc[reportedMH_agg_sex['Indicators'].isin(options)]
#create new graphing datasets
all_disorders_12_17= all_disorders[all_disorders['Age group'] == '12 to 17 years']
all_disorders_18_34= all_disorders[all_disorders['Age group'] == '18 to 34 years']
all_disorders_35_49= all_disorders[all_disorders['Age group'] == '35 to 49 years']
all_disorders_50_64= all_disorders[all_disorders['Age group'] == '50 to 64 years']
all_disorders_65= all_disorders[all_disorders['Age group'] == '65 years and over']
# calculate the normalized data using the formula x = (x-x_min)/(x_max-x_min).
all_disorders_12_17 = all_disorders_12_17.assign(Norm_Value=(all_disorders_12_17.VALUE-all_disorders_12_17.VALUE.min())/(all_disorders_12_17.VALUE.max()-all_disorders_12_17.VALUE.min()))
all_disorders_18_34 = all_disorders_18_34.assign(Norm_Value=(all_disorders_18_34.VALUE-all_disorders_18_34.VALUE.min())/(all_disorders_18_34.VALUE.max()-all_disorders_18_34.VALUE.min()))
all_disorders_35_49 = all_disorders_35_49.assign(Norm_Value=(all_disorders_35_49.VALUE-all_disorders_35_49.VALUE.min())/(all_disorders_35_49.VALUE.max()-all_disorders_35_49.VALUE.min()))
all_disorders_50_64 = all_disorders_50_64.assign(Norm_Value=(all_disorders_50_64.VALUE-all_disorders_50_64.VALUE.min())/(all_disorders_50_64.VALUE.max()-all_disorders_50_64.VALUE.min()))
all_disorders_65 = all_disorders_65.assign(Norm_Value=(all_disorders_65.VALUE-all_disorders_65.VALUE.min())/(all_disorders_65.VALUE.max()-all_disorders_65.VALUE.min()))
#plot visualizations
fig1 = px.bar(all_disorders_12_17, x="Year", y="Norm_Value", color='Indicators', title='All Disorders Together - 12 to 17 years', )
fig1.update_yaxes(range = [0,4.5] )
fig1.show()
fig2 = px.bar(all_disorders_18_34, x="Year", y="Norm_Value", color='Indicators', title='All Disorders Together - 18 to 34 years')
fig2.update_yaxes(range = [0,4.5] )
fig2.show()
fig3 = px.bar(all_disorders_35_49, x="Year", y="Norm_Value", color='Indicators', title='All Disorders Together - 35 to 49 years')
fig3.update_yaxes(range = [0,4.5] )
fig3.show()
fig4 = px.bar(all_disorders_50_64, x="Year", y="Norm_Value", color='Indicators', title='All Disorders Together - 50 to 64 years')
fig4.update_yaxes(range = [0,4.5] )
fig4.show()
fig5 = px.bar(all_disorders_65, x="Year", y="Norm_Value", color='Indicators', title='All Disorders Together - 65 years and over')
fig5.update_yaxes(range = [0,4.5] )
fig5.show()
fig1.write_image('images/AllDisorders1217.png')
fig2.write_image('images/AllDisorders1834.png')
fig3.write_image('images/AllDisorders3549.png')
fig4.write_image('images/AllDisorders5064.png')
fig5.write_image('images/AllDisorders65.png')
Having completed the analysis of the data pertaining to cohorts segregated by sex and age, we can conclude that:
The data is first read in via seperate csv files. On the CIHR website the mental health hospital data is only in a standardized format back to 2017. Unforunately each csv file has different columns so we were unable to write a loop which would bring in all of the data in a standardized method. This was a great challenge and it required that we merge together seperate files to create larger dataframes on which we perfomed our visualizations. In order to assess this question we utilized the Hospitalizations in Canada Supplementary Tables which are released annually by the Canadian Institute for Health Information (CIHI). Within the CIHI data, we utilized the attributes ‘number of discharges’ and ‘crude discharge rate per 100,000’ and searched for patterns that would suggest a rise in mental health issues within Alberta.
In order to assist the reader in understanding the "read in" of these various files, we have included a list which outlines what attributes are included within each dataframe.
General_hosp_xx_xx.csv; Diagnosis category based on primary diagnosis, Discharges, Percentage of discharges, Median length of stay (days), Average length of stay (days) [ALL FOR GENERAL HOSPITALS IN CANADA]
Psych_hosp_xx_xx.csv; Diagnosis category based on primary diagnosis, Discharges, Percentage of discharges, Median length of stay (days), Average length of stay (days) [ALL FOR PSYCHIATRIC WARDS IN CANADA]
urlPath = githubRepo + 'Hospital%20Based%20Data/'
General_hosp_17_18 = pd.read_csv(urlPath + 'General_hosp_17_18.csv')
General_hosp_18_19 = pd.read_csv(urlPath + 'General_hosp_18_19.csv')
General_hosp_19_20 = pd.read_csv(urlPath + 'General_hosp_19_20.csv')
Psych_hosp_17_18 = pd.read_csv(urlPath + 'Psych_hosp_17_18.csv')
Psych_hosp_18_19 = pd.read_csv(urlPath + 'Psych_hosp_18_19.csv')
Psych_hosp_19_20 = pd.read_csv(urlPath + 'Psych_hosp_19_20.csv')
Combined_stats_17_18 = pd.read_csv(urlPath + 'Combined_stats_17_18.csv')
Combined_stats_18_19 = pd.read_csv(urlPath + 'Combined_stats_18_19.csv')
Combined_stats_19_20 = pd.read_csv(urlPath + 'Combined_stats_19_20.csv')
Combined_stats_by_sex_17_18 = pd.read_csv(urlPath + 'AB_Combined_stats_by_sex_17_18.csv')
Combined_stats_by_sex_18_19 = pd.read_csv(urlPath + 'AB_Combined_stats_by_sex_18_19.csv')
Combined_stats_by_sex_19_20 = pd.read_csv(urlPath + 'AB_Combined_stats_by_sex_19_20.csv')
Readmission_17_18 = pd.read_csv(urlPath + 'Readmission_17_18.csv')
Readmission_18_19 = pd.read_csv(urlPath + 'Readmission_18_19.csv')
Readmission_19_20 = pd.read_csv(urlPath + 'Readmission_19_20.csv')
We show the general hospital dataframe here as a sample
General_hosp_17_18.head(4)
| Diagnosis category based on \nprimary diagnosis | Discharges | Percentage \nof discharges | Median length of stay \n(days) | Average length of stay \n(days) | 0.5% trimmed average (days) | Total length of stay (days) | |
|---|---|---|---|---|---|---|---|
| 0 | Organic disorders | 36370 | 16.21 | 15 | 31.15 | 28.77 | 1132749 |
| 1 | Substance-related disorders | 44133 | 19.66 | 3 | 6.41 | 5.66 | 283062 |
| 2 | Schizophrenic and psychotic disorders | 40683 | 18.13 | 12 | 22.42 | 20.53 | 912033 |
| 3 | Mood disorders | 53346 | 23.77 | 8 | 15.05 | 14.14 | 802645 |
Because the .csv files all come from different places, it was required that we append additional columns which indicate the year the dataset came from, as well as the hospital type (psychiatric or general). Taking this step ensured that our larger concatenated dataframe would be able to create visualizations that were segregated by year as well as diagnosis category based on primary diagnosis.
General_hosp_17_18['Date'] = ['2017-2018']*(General_hosp_17_18.shape[0])
General_hosp_17_18['Hospital Type'] = ['General']*(General_hosp_17_18.shape[0])
General_hosp_18_19['Date'] = ['2018-2019']*(General_hosp_18_19.shape[0])
General_hosp_18_19['Hospital Type'] = ['General']*(General_hosp_18_19.shape[0])
General_hosp_19_20['Date'] = ['2019-2020']*(General_hosp_19_20.shape[0])
General_hosp_19_20['Hospital Type'] = ['General']*(General_hosp_19_20.shape[0])
Psych_hosp_17_18['Date'] = ['2017-2018']*(Psych_hosp_17_18.shape[0])
Psych_hosp_17_18['Hospital Type'] = ['Psych']*(Psych_hosp_17_18.shape[0])
Psych_hosp_18_19['Date'] = ['2018-2019']*(Psych_hosp_18_19.shape[0])
Psych_hosp_18_19['Hospital Type'] = ['Psych']*(Psych_hosp_18_19.shape[0])
Psych_hosp_19_20['Date'] = ['2019-2020']*(Psych_hosp_19_20.shape[0])
Psych_hosp_19_20['Hospital Type'] = ['Psych']*(Psych_hosp_19_20.shape[0])
We show the general hospital dataframe here as a sample after adding columns
General_hosp_17_18.head(4)
| Diagnosis category based on \nprimary diagnosis | Discharges | Percentage \nof discharges | Median length of stay \n(days) | Average length of stay \n(days) | 0.5% trimmed average (days) | Total length of stay (days) | Date | Hospital Type | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Organic disorders | 36370 | 16.21 | 15 | 31.15 | 28.77 | 1132749 | 2017-2018 | General |
| 1 | Substance-related disorders | 44133 | 19.66 | 3 | 6.41 | 5.66 | 283062 | 2017-2018 | General |
| 2 | Schizophrenic and psychotic disorders | 40683 | 18.13 | 12 | 22.42 | 20.53 | 912033 | 2017-2018 | General |
| 3 | Mood disorders | 53346 | 23.77 | 8 | 15.05 | 14.14 | 802645 | 2017-2018 | General |
In order to create visualizations that encompassed data back to 2017, we concatenated the dataframes in a vertical fashion, as the columns were the same across files, we then removed columns that were unneccessary to simplify the dataframes which we were working with.
#concatenate vertically across the 3 general hospital dataframes
General_hosp_concat = pd.concat([General_hosp_17_18, General_hosp_18_19, General_hosp_19_20])
#concatenate vertically across the 3 psychiatric hospital dataframes
Psych_hosp_concat = pd.concat([Psych_hosp_17_18, Psych_hosp_18_19, Psych_hosp_19_20])
#concatenate the concatenated general hospital dataframes and the concatenated psychiatric hospital dataframes
Hosp_concat = pd.concat([Psych_hosp_concat, General_hosp_concat])
#drop unncessary columns
Hosp_concat = Hosp_concat.drop(columns=['0.5% trimmed mean \n(days)',
'Total length of stay (days)',
'0.5% trimmed average (days)'],axis=1)
Hosp_concat.head(4)
| Diagnosis category based on \nprimary diagnosis | Discharges | Percentage \nof discharges | Median length of stay \n(days) | Average length of stay \n(days) | Date | Hospital Type | |
|---|---|---|---|---|---|---|---|
| 0 | Organic disorders | 1241 | 4.30 | 60.0 | 161.28 | 2017-2018 | Psych |
| 1 | Substance-related disorders | 5901 | 20.44 | 15.0 | 27.06 | 2017-2018 | Psych |
| 2 | Schizophrenic and psychotic disorders | 8681 | 30.07 | 33.0 | 123.87 | 2017-2018 | Psych |
| 3 | Mood disorders | 6607 | 22.88 | 20.0 | 43.14 | 2017-2018 | Psych |
Because the .csv files all come from different places, it was required that we again append additional columns which indicate the year the dataset came from. Taking this step ensured that our larger concatenated dataframe would be able to create visualizations that were segregated by year. The dataframe Combined_stats_xx_xx.csv for each year contains the number of discharges and the age-standardized discharge rate per 100,000 of population for each province within Canada.
Combined_stats_17_18['Date'] = ['2017-2018']*(Combined_stats_17_18.shape[0])
Combined_stats_18_19['Date'] = ['2018-2019']*(Combined_stats_18_19.shape[0])
Combined_stats_19_20['Date'] = ['2019-2020']*(Combined_stats_19_20.shape[0])
Combined_stats_concat = pd.concat([Combined_stats_17_18, Combined_stats_18_19, Combined_stats_19_20])
Combined_stats_concat.head(4)
| Province/territory | Discharges | Age-standardized discharge rate per 100,000 population | Date | |
|---|---|---|---|---|
| 0 | Newfoundland and Labrador | 3549 | 689.70 | 2017-2018 |
| 1 | Prince Edward Island | 1348 | 911.26 | 2017-2018 |
| 2 | Nova Scotia | 6008 | 626.94 | 2017-2018 |
| 3 | New Brunswick | 6037 | 795.44 | 2017-2018 |
Because the .csv files all come from different places, it was required that we again append additional columns which indicate the year the dataset came from. Taking this step ensured that our larger concatenated dataframe would be able to create visualizations that were segregated by year. The dataframe AB_Combined_stats_by_sex_xx_xx.csv for each year contains the number of discharges and the age-standardized discharge rate per 100,000 of population for each province within Canada. The data is speperated into diagnosis categories (based on primary psychiatric diagnosis)
Combined_stats_by_sex_17_18['Date'] = ['2017-2018']*(Combined_stats_by_sex_17_18.shape[0])
Combined_stats_by_sex_18_19['Date'] = ['2018-2019']*(Combined_stats_by_sex_18_19.shape[0])
Combined_stats_by_sex_19_20['Date'] = ['2019-2020']*(Combined_stats_by_sex_19_20.shape[0])
Combined_stats_by_sex = pd.concat([Combined_stats_by_sex_17_18, Combined_stats_by_sex_18_19, Combined_stats_by_sex_19_20])
Combined_stats_by_sex.head(4)
| Sex | Diagnosis category based on \nprimary diagnosis | Percentage \nof discharges | Crude discharge rate per \n100,000 population | Average length of stay \n(days) | Total length of stay \n(days) | Date | |
|---|---|---|---|---|---|---|---|
| 0 | Female | Organic disorders | 16.3 | 110.06 | 31.74 | 646329 | 2017-2018 |
| 1 | Female | Substance-related disorders | 12.92 | 87.23 | 8.26 | 133332 | 2017-2018 |
| 2 | Female | Schizophrenic and psychotic disorders | 14.82 | 100.08 | 38.42 | 711296 | 2017-2018 |
| 3 | Female | Mood disorders | 28.05 | 189.39 | 17.78 | 623142 | 2017-2018 |
Because the .csv files all come from different places, it was required that we again append additional columns which indicate the year the dataset came from. Taking this step ensured that our larger concatenated dataframe would be able to create visualizations that were segregated by year. The dataframe Combined_stats_by_sex_xx_xx.csv for each year contains the number of discharges and the age-standardized discharge rate per 100,000 of population for each province within Canada.
Readmission_17_18['Date'] = ['2017-2018']*(Readmission_17_18.shape[0])
Readmission_18_19['Date'] = ['2018-2019']*(Readmission_18_19.shape[0])
Readmission_19_20['Date'] = ['2019-2020']*(Readmission_19_20.shape[0])
Readmission_concat = pd.concat([Readmission_17_18, Readmission_18_19, Readmission_19_20])
Readmission_concat.head(4)
| Province/territory | Region | 30-day readmission for mood disorders, risk-adjusted rate (per 100), 2017–2018 | 95% confidence interval,\n2017–2018 | Date | 30-day readmission for mood disorders, risk-adjusted rate (per 100), 2018–2019 | 95% confidence interval,\n2018–2019 | 30-day readmission for mood disorders, risk-adjusted rate (per 100), 2019–2020 | 95% confidence interval,\n2019–2020 | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Newfoundland and Labrador | Eastern Health | 8.7 | (5.8–12.7) | 2017-2018 | NaN | NaN | NaN | NaN |
| 1 | Newfoundland and Labrador | Central Health | 11.0 | (6.4–17.6) | 2017-2018 | NaN | NaN | NaN | NaN |
| 2 | Newfoundland and Labrador | Western Health | 10.5 | (7.0–15.2) | 2017-2018 | NaN | NaN | NaN | NaN |
| 3 | Newfoundland and Labrador | Labrador–Grenfell Health | 12.0 | (7.0–19.2) | 2017-2018 | NaN | NaN | NaN | NaN |
#remove the unncessary row that aggregates all data for Canada from the concatenated dataframe
Combined_stats_concat.drop(Combined_stats_concat.index[Combined_stats_concat['Province/territory'] == 'Canada (Organic disorders included)'], inplace = True)
Combined_stats_concat.head(10)
| Province/territory | Discharges | Age-standardized discharge rate per 100,000 population | Date | |
|---|---|---|---|---|
| 0 | Newfoundland and Labrador | 3549 | 689.70 | 2017-2018 |
| 1 | Prince Edward Island | 1348 | 911.26 | 2017-2018 |
| 2 | Nova Scotia | 6008 | 626.94 | 2017-2018 |
| 3 | New Brunswick | 6037 | 795.44 | 2017-2018 |
| 4 | Quebec | 58027 | 682.79 | 2017-2018 |
| 5 | Ontario | 93641 | 657.22 | 2017-2018 |
| 6 | Manitoba | 7321 | 550.44 | 2017-2018 |
| 7 | Saskatchewan | 8420 | 742.95 | 2017-2018 |
| 8 | Alberta | 28339 | 688.56 | 2017-2018 |
| 9 | British Columbia | 39465 | 813.54 | 2017-2018 |
#prior to plotting, filter out the rows that contain totals for all of Canada and remove the rows that contain unknown disorders as we are focused on the disorders related clearly to economic stress
Combined_stats_by_sex.drop(Combined_stats_by_sex.index[Combined_stats_by_sex['Diagnosis category based on \nprimary diagnosis'] == 'Total (Organic disorders included)'], inplace = True)
Combined_stats_by_sex.drop(Combined_stats_by_sex.index[Combined_stats_by_sex['Diagnosis category based on \nprimary diagnosis'] == 'Unknown disorders'], inplace = True)
Combined_stats_by_sex.head(4)
| Sex | Diagnosis category based on \nprimary diagnosis | Percentage \nof discharges | Crude discharge rate per \n100,000 population | Average length of stay \n(days) | Total length of stay \n(days) | Date | |
|---|---|---|---|---|---|---|---|
| 0 | Female | Organic disorders | 16.3 | 110.06 | 31.74 | 646329 | 2017-2018 |
| 1 | Female | Substance-related disorders | 12.92 | 87.23 | 8.26 | 133332 | 2017-2018 |
| 2 | Female | Schizophrenic and psychotic disorders | 14.82 | 100.08 | 38.42 | 711296 | 2017-2018 |
| 3 | Female | Mood disorders | 28.05 | 189.39 | 17.78 | 623142 | 2017-2018 |
As was showed previously in Question 1, there were varying economic conditions between 2017 and 2020 in Alberta. We wanted to see if there was a relative increase in the number of discharges within Alberta over the period of 2017 to 2020, and as you can see below, it is obvious that from 2017 to 2020 there was no relative changes in absolute discharge numbers across the country. This was quite puzzling to us, and a little bit underwhelming… and we thought about why that could be… and we realized it is likely due to the fact that hospitals are usually operating at or near capacity, especially within emergency services, which is typically where mental health patients are discharged from.
labels = ["Newfoundland and Labrador", "Prince Edward Island", "Nova Scotia", "New Brunswick", "Quebec", "Ontario",
"Manitoba", "Saskatchewan", "Alberta", "British Columbia"]
# Create subplots: use 'domain' type for Pie subplot
fig_pie = make_subplots(rows=1, cols=3, specs=[[{'type':'domain'}, {'type':'domain'}, {'type':'domain'}]])
fig_pie.add_trace(go.Pie(labels=labels, values=Combined_stats_17_18['Discharges'], name="2017-2018"),
1, 1)
fig_pie.add_trace(go.Pie(labels=labels, values = Combined_stats_18_19['Discharges'], name="2018-2019"),
1, 2)
fig_pie.add_trace(go.Pie(labels=labels, values = Combined_stats_19_20['Discharges'], name="2019-2020"),
1, 3)
# Use `hole` to create a donut-like pie chart
fig_pie.update_traces(hole=.4, hoverinfo="label+percent+name")
fig_pie.update_layout(
title_text="Relative Number of Psychiatric Discharges from All Hospitals Across Canada (2017 - 2020)",
# Add annotations in the center of the donut pies.
annotations=[dict(text='2017-2018', x=0.12, y=0.5, font_size=12, showarrow=False),
dict(text='2018-2019', x=0.50, y=0.5, font_size=12, showarrow=False),
dict(text='2019-2020', x=0.88, y=0.5, font_size=12, showarrow=False)])
fig_pie.show()
fig_pie.write_image('images/Discharge.png')
fig_dis = px.bar(Combined_stats_concat,
x="Discharges",
y="Province/territory",
color="Date",
barmode = 'group',
title="Number of Psychiatric Discharges from All Hospitals Across Canada (2017-2020)")
fig_dis.show()
fig_dis.write_image('images/Discharge1.png')
We weren’t entirely satisfied with merely concluding there were no relative changes in discharges across the country. We decided to dig deeper. We were able to merge multiple datasets together to get a look at separated diagnosis categories within Alberta. The legend and y axis below show you see organic disorders (which are disorders that occur naturally, so think of things like dementia), as well as substance abuse-related disorders, schizophrenic and psychotic disorders, mood disorders, anxiety disorders, personality disorders, and other disorders. The horizontal bar chart is separated on the x and y axis, the x axis includes the date from 2017 to 2020 and on the y axis it includes male and female sex. To clarify, this data is specifically for the province of Alberta. Along the x axis is the label stating crude discharge rate per 100,000. What we noticed was anxiety disorders, mood disorders, substance-related disorders, and personality disorders all show a clear increase in the crude discharge rate per 100,000. Those are all disorders you would suspect someone to have a higher incidence of if they were under extreme stress that may be put on them by the volatile Alberta economy.
Conversely when analyzing the organic disorders (which includes things such as dementia and mental health disorders related to aging), it is no surprise that those crude discharge rates are relatively flat over the timespan. Another interesting finding is the rise in substance related disorders in men... it is very large, whereas in females that is not as pronounced, and conversely the rise in anxiety disorders in females is much higher than the rise in anxiety disorders in males.
We felt this chart in particular adequately answered our question as it shows trends within Alberta, the difference between subgroups such as males and females, and can be correlated with the varying economic data within the timeframe which was discussed in Question 1.
# due to an issue reading in the data, our x axis needed to be concerted to a float
Combined_stats_by_sex['Crude discharge rate per \n100,000 population'] = Combined_stats_by_sex['Crude discharge rate per \n100,000 population'].astype(float)
fig_crude = px.bar(Combined_stats_by_sex, x="Crude discharge rate per \n100,000 population", y="Diagnosis category based on \nprimary diagnosis",
labels={"Diagnosis category based on \nprimary diagnosis": "Diagnosis",
"Crude discharge rate per \n100,000 population": "Discharge"},
title="Alberta Hospital Crude Discharge per 100,000 population For Various Primary Mental Health Disorders",
color="Diagnosis category based on \nprimary diagnosis",
barmode="group",
facet_row="Sex",
facet_col="Date",
category_orders={"Date": ["2017-2018", "2018-2019", "2019-2020"],
"Sex": ["Male", "Female"]}, orientation='h')
fig_crude.show()
fig_crude.write_image('images/Discharge2.png')
There was no increase in relative discharges within Alberta during periods of economic downturn, this was likely due to the fact that emergency services within the hospital has been operating at full capacity for a significant period of time. When looking deeper into crude discharges per 100,000 of population, there is a clear rise in anxiety disorders, substance related disorders, mood disorders and personality disorders. There are distinguishable differences between males and females. Males are more likely to incur a substance abuse disorder, whereas females are more likely to experience an anxiety disorder. It can be speculated that at least some of this increase in prevalence was due to economic conditions... but as was mentioned earlier in the analysis correlation does not fully imply causation.
From this analysis, we were able to identify key relationships between economic volatility within the province of Alberta and the growing mental health crisis in this region. Key economic events in 2016 and 2020 triggered downward trends in economic activity and updward trends in negative mental health indicators. Of these, more people were reporting that they experienced more stressful days than not and higher rates of 'fair' and 'poor' mental health perpectives overall. In a deeper analysis, it was found that those aged 35-49 years old were the most adversely impacted by economic downturns in terms of mental health. As the working generation with children at a schooling age, this subgroup may feel the effects of economic shifts more than those that are younger and early in their careers. Interestingly, females were more adversely impacted then men were in terms of mental health. Lastly, we found through hospital admissions data that anxiety disorders and substance related disorders both have the largest increase in reported cases since 2017, which could be a result of economic hardships in the province.
What we drew from this analysis is the importance of understanding day-to-day economic indicators and the impact they can have on those around us. When we notice that gas prices might be going up or that inflation is hurting our wallets, those around us may be feeling it a little more then we are. When things get hard, it is always important to treat others with kindness and respect as the day-to-day battle with mental health could effect anyone such as friends, family, and strangers. The insights gained from this exploration are key in our understanding of these concepts and key in how we choose to treat others and ourselves in the future as we try to navigate trying times as a community.
Lastly, the next steps for this analysis is to build on these trends and identify other subgroups which may be adversely impacted. This effort could lead local community leaders to offer aid to those affected. Subgroups could include race and occupational backgrounds most effected by economic and mental health hardship. Likewise, a deeper dive into hospitalizations and mental health trends during COVID-19 would help solidify further conclusions. Alternative next steps could include the provincial government utilizing the trends within mental health data to improve the allocation of healthcare budgets. Perhaps the government could look at conducting a feasbiility assessment for social outreach cetners to help address stress and anxiety, particularily within patients aged 35-49. Social programs could also be developed to address declining mental health indicators when an economic downturn is occuring.